Parallel Machine Learning Using Concurrency Control

نویسنده

Xinghao Pan

چکیده

Parallel Machine Learning Using Concurrency Control by Xinghao Pan Doctor of Philosophy in Computer Science and the Designated Emphasis in Communications, Computation and Statistics University of California, Berkeley Professor Michael I. Jordan, Chair Many machine learning algorithms iteratively process datapoints and transform global model parameters. It has become increasingly impractical to serially execute such iterative algorithms as processor speeds fail to catch up to the growth in dataset sizes. To address these problems, the machine learning community has turned to two parallelization strategies: bulk synchronous parallel (BSP), and coordination-free. BSP algorithms partition computational work among workers, with occasional synchronization at global barriers, but has only been applied to ‘embarrassingly parallel’ problems where work is trivially factorizable. Coordination-free algorithms simply allow concurrent processors to execute in parallel, interleaving transformations and possibly introducing inconsistencies. Theoretical analysis is then required to prove that the coordination-free algorithm produces a reasonable approximation to the desired outcome, under assumptions on the problem and system. In this dissertation, we propose and explore a third approach by applying concurrency control to manage parallel transformations in machine learning algorithms. We identify points of possible interference between parallel iterations by examining the semantics of the serial algorithm. Coordination is then introduced to either avoid or resolve such conflicts, whereas non-conflicting transformations are allowed to execute concurrently. Our parallel algorithms are thus engineered to produce the same exact output as the serial machine learning algorithm, preserving the serial algorithm’s theoretical guarantees of correctness while maximizing concurrency. We demonstrate the feasibility of our approach to parallelizing a variety of machine learning algorithms, including nonparametric unsupervised learning, graph clustering, discrete optimization, and sparse convex optimization. We theoretically prove and empirically verify that our parallel algorithms produce equivalent output to their serial counterparts. We also theoretically analyze the expected concurrency of our parallel algorithms, and empirically demonstrate their scalability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forward kinematic analysis of planar parallel robots using a neural network-based approach optimized by machine learning

The forward kinematic problem of parallel robots is always considered as a challenge in the field of parallel robots due to the obtained nonlinear system of equations. In this paper, the forward kinematic problem of planar parallel robots in their workspace is investigated using a neural network based approach. In order to increase the accuracy of this method, the workspace of the parallel robo...

متن کامل

Optimistic Concurrency Control for Distributed Unsupervised Learning

Research on distributed machine learning algorithms has focused primarily on one of two extremes—algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this “opt...

متن کامل

Parallel Double Greedy Submodular Maximization

Many machine learning problems can be reduced to the maximization of submodular functions. Although well understood in the serial setting, the parallel maximization of submodular functions remains an open area of research with recent results [1] only addressing monotone functions. The optimal algorithm for maximizing the more general class of non-monotone submodular functions was introduced by ...

متن کامل

On Mining Concurrency Defect-Related Reports from Bug Repositories

We present early findings of two ongoing case studies in which we automatically extract reports about concurrency defects from the MySQL and Apache bug repositories. To mine the unstructured reports, we apply keyword search and machine learning, using linear and non-linear classifiers. We analyze the results in detail and suggest some improvements for this mining task. Automated Bug Report Clas...

متن کامل

Two-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect

This paper deals with the determination of machine numbers and production schedules in manufacturing environments. In this line, a two-stage fuzzy stochastic programming model is discussed with fuzzy processing times where both deterioration and learning effects are evaluated simultaneously. The first stage focuses on the type and number of machines in order to minimize the total costs associat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Parallel Machine Learning Using Concurrency Control

نویسنده

چکیده

منابع مشابه

Forward kinematic analysis of planar parallel robots using a neural network-based approach optimized by machine learning

Optimistic Concurrency Control for Distributed Unsupervised Learning

Parallel Double Greedy Submodular Maximization

On Mining Concurrency Defect-Related Reports from Bug Repositories

Two-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect

عنوان ژورنال:

اشتراک گذاری